Learning K-way D-dimensional Discrete Code For Compact Embedding Representations

نویسندگان

  • Ting Chen
  • Martin Renqiang Min
  • Yizhou Sun
چکیده

Embedding methods such as word embedding have become pillars for many applications containing discrete structures. Conventional embedding methods directly associate each symbol with a continuous embedding vector, which is equivalent to applying linear transformation based on “one-hot” encoding of the discrete symbols. Despite its simplicity, such approach yields number of parameters that grows linearly with the vocabulary size and can lead to overfitting. In this work we propose a much more compact K-way D-dimensional discrete encoding scheme to replace the “one-hot" encoding. In “KD encoding”, each symbol is represented by a D-dimensional code, and each of its dimension has a cardinality of K. The final symbol embedding vector can be generated by composing the code embedding vectors. To learn the semantically meaningful code, we derive a relaxed discrete optimization technique based on stochastic gradient descent. By adopting the new coding system, the efficiency of parameterization can be significantly improved (from linear to logarithmic), and this can also mitigate the over-fitting problem. In our experiments with language modeling, the number of embedding parameters can be reduced by 97% while achieving similar or better performance.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Max Margin Dimensionality Reduction

A fundamental problem in machine learning is to extract compact but relevant representations of empirical data. Relevance can be measured by the ability to make good decisions based on the representations, for example in terms of classification accuracy. Compact representations can lead to more human-interpretable models, as well as improve scalability. Furthermore, in multi-class and multi-tas...

متن کامل

Supervised Manifold Learning for Media Interestingness Prediction

In this paper, we describe the models designed for automatically selecting multimedia data, e.g., image and video segments, which are considered to be interesting for a common viewer. Specifically, we utilize an existing dimensionality reduction method called Neighborhood MinMax Projections (NMMP) to extract the low-dimensional features for predicting the discrete interestingness labels. Meanwh...

متن کامل

Taste Space Versus the World: an Embedding Analysis of Listening Habits and Geography

Probabilistic embedding methods provide a principled way of deriving new spatial representations of discrete objects from human interaction data. The resulting assignment of objects to positions in a continuous, low-dimensional space not only provides a compact and accurate predictive model, but also a compact and flexible representation for understanding the data. In this paper, we demonstrate...

متن کامل

Variable Elimination in the Fourier Domain

The ability to represent complex high dimensional probability distributions in a compact form is one of the key insights in the field of graphical models. Factored representations are ubiquitous in machine learning and lead to major computational advantages. We explore a different type of compact representation based on discrete Fourier representations, complementing the classical approach base...

متن کامل

Variable Elimination in Fourier Domain

Probabilistic inference is a key computational challenge in statistical machine learning and artificial intelligence. The ability to represent complex high dimensional probability distributions in a compact form is the most important insight in the field of graphical models. In this paper, we explore a novel way to exploit compact representations of highdimensional probability distributions in ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره abs/1711.03067  شماره 

صفحات  -

تاریخ انتشار 2017